Sensitivity Analysis of Core Specialization Techniques

نویسندگان

  • Prathmesh Kallurkar
  • Smruti R. Sarangi
چکیده

The instruction footprint of OS-intensive workloads such as web servers, database servers, and file servers typically exceeds the size of the instruction cache (32 KB). Consequently, such workloads incur a lot of i-cache misses, which reduces their performance drastically. Several papers [6, 8, 5, 2, 3] have proposed to improve the performance of such workloads using core specialization. In this scheme, tasks with different instruction footprints are executed on different cores. In this report, we study the performance of five state of the art core specialization techniques: SelectiveOffload [6], FlexSC [8], DisAggregateOS [5], SLICC [2], and SchedTask [3] for different system parameters. Our studies show that for a suite of 8 popular OS-intensive workloads, SchedTask performs best for all evaluated configurations. 1 Multi-programmed Workloads We compare the impact of all core specialization techniques on a server that is executing multiple OS-intensive applications. Table 1 shows the constituent benchmarks and their workloads for each multi-programmed workload, and Figure 1 shows the impact of different core specialization techniques on the weighted instruction throughput of each multi-programmed workload. We ∗The author contributed to this work while at Indian Institute of Technology Delhi Bag ID Constituent benchmarks Workload of individual benchmark MPW-A DSS, FileSrv 1X MPW-B Apache, OLTP 1X MPW-C Apache, DSS, FileSrv, Iscp 0.5X MPW-D Apache, DSS, Find, OLTP 0.5X MPW-E Find, FileSrv, Iscp, Oscp 0.5X MPW-F Apache, FileSrv, MailSrvIO, OLTP 0.5X Table 1: Constituent benchmarks of multi-programmed workloads MP WA MP WB MP WC MP WD MP WE MP WF gm ea n 10 0 10 20 30 40 Ch an ge in in st . t hr ou gh pu t ( % ) -13 -19 SelectiveOffload FlexSC DisAggregateOS SLICC SchedTask Figure 1: Impact of different techniques on the instruction throughput of a system executing multi-programmed workloads start by allocating equal number of cores for each benchmark and then let the scheduling techniques decide the appropriate number of cores to execute the constituent tasks of each multi-programmed workload. The mean improvement in the weighted instruction throughput for these techniques is: SelectiveOffload (21.48%), FlexSC (2.26%), DisAggregateOS (9.47%), SLICC (5.64%), and SchedTask (23.94%). The primary point to note from Figure 1 is that the performance of SLICC is low for multiprogrammed workloads. This is an artifact of SLICC’s thread decomposition policy, which does not group common portions of OS execution across different applications. FlexSC, DisAggregateOS, and SchedTask group system calls based on their IDs. Hence, for these techniques, there is a high correlation between their performance of a multi-programmed workload and its constituent benchmarks. 1 ar X iv :1 70 8. 03 90 0v 1 [ cs .A R ] 1 3 A ug 2 01 7 iSize Technique Find Iscp Oscp Apache DSS FileSrv MailSrvIO OLTP geom. mean iHit Perf iHit Perf iHit Perf iHit Perf iHit Perf iHit Perf iHit Perf iHit Perf iHit Perf 16 KB SelectiveOffload 1 10 1 21 1 12 1 31 3 5 0 6 1 0 4 11 1 12 FlexSC 7 -48 6 -40 6 -50 -1 12 1 6 1 25 2 12 2 10 3 -14 DisAggregateOS 2 0 1 14 1 10 2 20 3 6 1 16 3 0 4 9 2 9 SLICC 1 4 1 24 1 12 1 1 2 5 1 15 2 0 2 11 1 8 SchedTask 2 11 1 40 1 23 2 44 2 10 1 34 2 28 3 17 1 25 32 KB SelectiveOffload 2 7 2 21 1 8 2 27 3 5 1 4 3 0 3 9 2 10 FlexSC 10 -51 7 -44 6 -56 -1 7 2 6 2 29 2 12 1 4 3 -18 DisAggregateOS 3 -2 2 16 1 4 4 20 3 6 2 20 5 4 3 6 3 9 SLICC 4 3 2 28 1 7 3 9 1 5 1 20 3 2 2 13 2 11 SchedTask 4 7 3 39 1 15 4 38 3 10 2 44 4 28 3 12 3 23 64 KB SelectiveOffload 3 6 3 22 2 6 4 26 0 5 1 5 3 -1 2 8 2 9 FlexSC 8 -52 6 -45 4 -57 1 8 0 6 1 23 2 12 1 4 3 -19 DisAggregateOS 5 -1 4 16 2 3 8 22 0 6 1 27 4 5 3 5 3 10 SLICC 5 4 3 33 2 7 8 21 0 6 1 26 2 5 3 19 3 15 SchedTask 5 6 4 39 2 13 8 37 0 11 1 36 3 28 2 13 3 22 iSize is the size of the i-cache. iHit and Perf are the change (%) in i-cache hit rate and the instruction throughput respectively relative to the baseline with the same i-cache size Table 2: Impact of the size of the instruction cache on the instruction cache hit rate and instruction throughput 2 Instruction Cache Size Table 2 shows the impact of the i-cache size on the i-cache hit rate and the instruction throughput derived by all core specialization techniques. We evaluate all techniques for the following three i-cache configurations: 4-way 16 KB, 4-way 32 KB, and 4-way 64 KB. A baseline system with a smaller i-cache incurs more cache misses and therefore, the core specialization techniques can improve instruction throughput better. Our proposed technique improves throughput by 25%, 23%, and 22% over the baseline for a 16 KB, 32 KB, and a 64 KB i-cache system, respectively. This results in a performance improvement of 13%, 12%, and 7% respectively over the best state of the art techniques. 3 Cache Configuration Table 3 describes three cache configurations (Config1, Config2, and Config3) and their impact on the instruction throughput of all techniques. Config1 and Config2 have two levels of cache hierarchy whereas Config3 has three levels of cache hierarchy. Since the performance benefit derived by a core specialization technique is directly proportional to the i-cache miss penalty, the performance of all techniques is the least for Config2 and the most for Config1. Our proposed technique improves throughput by 24%, 21%, and 23% over the baseline for a system with Config1, Config2, and Config3 cache configurations respectively. This results in a 7, 6, and 12 percentage point enhancement in performance (respectively) over the best existing techniques.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Factor Supplies and Specialization in the World Economy

A core prediction of the Heckscher-Ohlin theory is that countries specialize in goods in which they have a comparative advantage, and that the source of comparative advantage is differences in relative factor supplies. To examine this theory, we use the most extensive dataset available and document the pattern of industrial specialization and factor endowment differences in a broad sample of ri...

متن کامل

Non-linear Dynamic Analysis of Steel Hollow I-core Sandwich Panel under Air Blast Loading

In this paper, the non-linear dynamic response of novel steel sandwich panel with hollow I-core subjected to blast loading was studied. Special emphasis is placed on the evaluation of midpoint displacements and energy dissipation of the models. Several parameters such as boundary conditions, strain rate, mesh dependency and asymmetrical loading are considered in this study. The material and geo...

متن کامل

Design and performance investigation of electrospun PVA nanofibers containing core-shell nanostructures for anticancer drug delivery

Objective: The purpose of this work was design and performance investigation of a nanocarrier based on magnetic nanofibers containing core-shell nanostructuresfor anticancerdrug delivery of daunorubicin (DAN) by measuring their drug release at different pH values. Methods: Fe3O4 nanoparticles and Fe3O4@SiO2core-shell nanostructures were synthesized through coprecipitation and Stöber methodresp...

متن کامل

Stage specialization for design and analysis of flotation circuits

This paper presents a new approach for flotation circuit design. Initially, it was proven numerically and analytically that in order to achieve the highest recovery in different circuit configurations, the best equipment must be placed at the beginning stage of the flotation circuits. The size of the entering particles and the types of streams including pulp and froth were considered as the bas...

متن کامل

Functional Properties of Human Auditory Cortical Fields

While auditory cortex in non-human primates has been subdivided into multiple functionally specialized auditory cortical fields (ACFs), the boundaries and functional specialization of human ACFs have not been defined. In the current study, we evaluated whether a widely accepted primate model of auditory cortex could explain regional tuning properties of fMRI activations on the cortical surface ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1708.03900  شماره 

صفحات  -

تاریخ انتشار 2017